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Abstract The purpose of this paper is to estimate the intensity of a Poisson process N by using 
"ob"' thresholding rules. In this paper, the intensity, defined as the derivative of the mean measure of N 
with respect to ndx where n is a fixed parameter, is assumed to be non-compactly supported. The 
estimator / n>7 based on random thresholds is proved to achieve the same performance as the oracle 
estimator up to a logarithmic term. Oracle inequalities allow to derive the maxiset of / n ,«y. Then, 
minimax properties of f n>1 are established. We first prove that the rate of this estimator on Besov 
spaces Bp q when p < 2 is (log(n)/n) a ^ 1+2a \ This result has two consequences. First, it establishes 
that the minimax rate of Besov spaces B^ q with p < 2 when non compactly supported functions are 
considered is the same as for compactly supported functions up to a logarithmic term. This result 
r— T is new. Furthermore, / n>7 is adaptive minimax up to a logarithmic term. When p > 2, the situation 
changes dramatically and the rate of ] na on Besov spaces B" is worse than {\og(n)/n) a ^ l+2a \ 
Finally, the random threshold depends on a parameter 7 that has to be suitably chosen in practice. 
Some theoretical results provide upper and lower bounds of 7 to obtain satisfying oracle inequalities. 



Q ' Simulations reinforce these results. 
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1 1 Introduction 
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O 1.1 Motivations 

Statistical inference for the problem of estimating the intensity of some Poisson process is considered 
^ 1 in this paper. For this purpose, we assume that we are given observations of a Poisson process on 
M. and our goal is to provide a data-driven procedure with good performance for estimating the 
intensity of this process. 



This problem has already been extensively investigated. For instance, Rudemo [3j| studied data 



driven histogram and kernel estimates based on the cross-validation method. Kernel estimates were 



also studied by Kutoyants [29|| but in a non-adaptive framework. Donoho fitted the universal 



thresholding procedure proposed by Donoho and Johnstone [16|] for estimating Poisson intensity by 



using the Anscombe's transform. Kolaczyk [27] refined this idea by investigating the tails of the 
distribution of the noisy wavelet coefficients of the intensity. Still in the wavelet setting, Kim and 
Koo [25] studied maximum likelihood type estimates on sieves for an exponential family of wavelets. 
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And for a particular inverse problem, Cavalier and Koo [l_0j first derived optimal estimates in the 
minimax setting. More precisely, for their tomographic problem, Cavalier and Koo [Tc| pointed 
out minimax thresholding rules on Besov balls. By using model selection, other optimal estimators 
have been proposed by Reynaud-Bouret j3l| who obtained oracle type inequalities and minimax 
rates on a particular class of Besov spaces. In the more general setting of point measure, let us 
mention the work by Baraud and Birge (3| which deals with histogram selection with the use of 
Hellinger distance. These model selection results have been generalized by Birge |6i] who applied 
a general methodology based on T-estimators whose performance is measured by the Hellinger 
distance. However, as explained by Birge [fj, this methodology is too computationally intensive 
to be implemented. Related works in other settings are worth citing. For instance, in Poisson 
regression, Kolaczyk and Nowak f^S] considered penalized maximum likelihood estimates, whereas 
Antoniadis et al. [2] and Antoniadis and Sapatinas [3J focused on wavelet shrinkage. 

For our purpose, it is capital to note that in the previous works, estimation is performed by 
assuming that the intensity has in practice a compact support known by the statistician, [0, 1] 
in general. Actually, procedures of previous works are used after preprocessing. The support is 
indeed assumed to be in [0, M], where M is a known constant given either by some extra- knowledge 
concerning the data or by the largest observation. Then, all the observations are rescaled by 
dividing by M so that observations belong to [0, 1]. But all the previous estimators depend on a 
tuning parameter, which therefore depends in practice on M . If M is overestimated, the estimation is 
poor. Even taking the largest observation can be too rough if the distribution is heavy-tailed so that 
the largest observation may be very far away from the main part of the intensity. These problems 
become more crucial if one deals with data coming from other more complex point processes (see 
fl9| or [ill) where one knows that the support is overestimated by the theory and where the classical 
trick of using the largest observation cannot be considered. Consequently the assumption of known 
and bounded support is not considered in the present paper. 

Let us now describe more precisely our framework. We begin by giving the definition of a Poisson 
process to fix notations. 

Definition 1. Let N be a random countable subset o/R. N is said to be a Poisson process on R if 

- for all AcK, the number of points of N lying in A is a random variable, denoted Na, which 
obeys a Poisson law with parameter denoted by n(A) where \i is a measure on R ; 

- for all finite family of disjoints sets Ai,..., A n , Na x , ■ ■ ■ , Na„ are independent. 

The measure /jl, called the mean measure of N, is assumed to be finite to obtain almost surely 
a finite set of points for N. We denote by dN the discrete random measure YIt&n so we have 
for any function g, 



We assume that the mean measure is absolutely continuous with respect to the Lebesgue measure 
and for n, a fixed integer, we denote by / the intensity function of N defined by 




VxGK, f{x) 



fj,(dx) 



ndx 



We are interested in estimating / knowing the almost surely finite set of points N. The parameter 
n is introduced to derive results in an asymptotic setting where / is held fixed and n goes to +oo. 
Furthermore, note that observing the n-sample of Poisson processes (N\, . . . , N n ) with common 
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intensity / with respect to the Lebesgue measure is equivalent to observe the cumulative Poisson 
process N = Uf =A Ni with intensity n x / with respect to the Lebesgue measure. And in addition, 
this setting is close to the problem of density estimation where we observe a n-sample with density 
f/Jf(x)dx. 

Our goal is to build constructive data-driven estimators of / and for this purpose, we consider 
thresholding rules whose risk is measured under the L2-I0SS. Our framework is the following. Of 
course, / is non-negative and since we assume that < 00, this implies that / £ Li. Since 

we consider the L2-I0SS, / is assumed to be in L2. In particular, / is not assumed to be bounded 
(except in the minimax setting) and, as said previously, its support may be infinite. 

In a different setting, the problem of estimating a density with infinite support has been partly 
solved from the minimax point of view. See 0| where minimax results for a class of functions 
depending on a jauge are established or 21] and [3] for Sobolev classes. In these papers, the loss 



function depends on the parameters of the functional class. Similarly, Donoho et al. [17|] proved 
the optimality of wavelet linear estimators on Besov spaces Bp q when the L p -risk is considered. 
First general results where the loss is independent of the functional class have been pointed out 
by Juditsky and Lambert-Lacroix who investigated minimax rates on the particular class of 
the Besov spaces ^ for the L p -risk. When p > 2 + 1/a, the minimax risk is bounded by 
(log(n)/n) 2a /( 1+2a ) so is of the same order up to a logarithmic term as in the equivalent estimation 
problem on [0, 1]. However, the behavior of the minimax risk changes dramatically when p < 2+ 1/a, 
and in this case, it depends on p. In addition, Juditsky and Lambert-Lacroix [2j| pointed out a data- 
driven thresholding procedure achieving minimax rates up to a logarithmic term. In the maxiset 
setting, this procedure has been studied by Autin [l| and compared to other classical thresholding 
procedures. Finally, we can also mention that Bunea et al. 0| established oracle inequalities without 
any support assumption by using Lasso-type estimators. 



1.2 The estimation procedure 

Now, let us describe the estimation procedure considered in our paper. For this purpose, we assume 
in the following that the function / can be written as follows: 

/ = ^/3 a ^a, with fa = I f(x)<px(x)dx (1.1) 
agA 

where (<p\)xeA an d (va)asA are two infinite families of linearly independent functions of L2. Most 
of the further results are valid by taking (^a)agA = (^a)a£A to be an orthonormal basis of L2 (the 
Haar basis for instance). However, minimax results are established by considering special cases of 
biorthogonal wavelet bases and in this case (^a)asA and ((/?a)a£A are different (see Section [3]). We 
note 




which is equal to the L2-norm of / if ((^a)asA is orthonormal. We consider thresholding estimators 
based on observations ($\)\er n , where T n is a subset of A chosen later and 

VAe A, X = - [ Vx (x)dN x . 



Observe that V A € A, f3\ is an unbiased estimator of j3\. As Juditsky and Lambert-Lacroix [2j|, we 
threshold (5\ according to a random positive function of A depending on n and on a fixed parameter 
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7 fixed later, denoted by n\ n and the thresholding estimator of / is 

Aer n 

where 

In the sequel, we denote / 7 = (f n ,-y)n- 

The procedure (II. 2\ can also be seen as a model selection procedure. Indeed, for all g = ^AeA a A^A) 
we define the least square contrast by 

ln{g) = -2 ^2 OL\(5x + ^2 Ot\. 
AeA AeA 

For all subset of indices m, we denote by S m the subspace generated by {<p\, A G to}. The projection 
estimator onto S m is defined by 

/ m = arg min j n (g) = ^ $ X <P\- 

Aem 



Note that 

InCfm) = ~ J2 Pi 
Aem 



If we set 

pen(m) = ^ 



Aem 

then the thresholding estimator can be seen as a penalized projection estimator since we have 

/n,7 = fm = ^2 ^ xl \Px\>r,x,^ X 

Aer„ 



with 

m = arg min 7 n (/ m ) + pen(m) . (1.3) 

mcr„ L J 

Such an interpretation is used in Section 14.11 and for the proof of the main result of this paper. 



1.3 Overview of the paper 

In this paper, our goals are threefold. First of all, we wish to derive theoretical results for the 
L2-risk of / 7 by using three different points of view (oracle, maxiset and minimax), then we wish 
to discuss precisely the choice of the threshold and finally we wish to perform some simulations. 

Let us now describe our results for our first aim. Theorem Q] is the main result of the paper. 
With a convenient choice of the threshold and under very mild assumptions on T n , Theorem [1] 
proves that the thresholding estimate / 7 satisfies an oracle type inequality. We emphasize that 
this result is valid under very mild assumptions on /. Indeed, classical procedures use a bound 



for the sup-norm of / (see [10], [17|] or [3l|]). This is not the case here where the threshold is the 
sum of two terms, a purely random one that is the main term and a deterministic one (see (|2.2I) ). 
The definition of the threshold is extensively discussed in Section [2] By using biorthogonal wavelet 
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bases, we derive from Theorem [T] the oracle inequality satisfied by / 7 . More precisely, Theorem 
[2] in Section 14.11 shows that / 7 achieves the same performance as the oracle estimator up to a 
logarithmic term which is the price to pay for adaptation. From Theorem [2j we derive the maxiset 
results of this paper. Let us recall that the maxiset approach consists in investigating the maximal 
space (maxiset), where a given procedure achieves a given rate of convergence. For the maxiset 
theory, there is no a priori functional assumption. For a given procedure, the practitioner states 
the desired accuracy by fixing a rate and points out all the functions that can be estimated at this 
rate by the procedure. Obviously, the larger the maxiset, the better the procedure. We prove in 
Section 14.21 that under mild conditions, the maxiset of the estimate / 7 for classical rates of the 
form (log(n)/n) a /( 1+2a ) is, roughly speaking, the intersection of two spaces: a weak Besov space 
denoted W a and the classical Besov space B^^ (see Theorem [3] and Section I4T21 for more details). 
Interestingly, this maxiset result provides examples of non bounded functions that can be estimated 
at the rate (log(n)/n) Q// ( 1+2a ) when < a < 1/4 (see Proposition Q]). Furthermore, we derive from 
the maxiset result most of the minimax results briefly described now. 



As said previously, Juditsky and Lambert-Lacroix [24j] established minimax rates for the problem 
of estimating a density with an infinite support for the particular class of Besov spaces B^ ^ and for 
the Lp-loss. To the best of our knowledge, minimax rates are unknown for Besov spaces B^ q except 
for very special cases described above. Our goal is to deal with this issue in the Poisson setting 
and for the L2-I0SS. We emphasize that for the minimax setting, we assume that the function to 
be estimated is bounded. The results that we obtain are the following. When p < 2, under mild 
assumptions, the minimax rate of convergence associated with B^ q is the classical rate 

n -a/(l+2a) up 

to a logarithmic term. So, it is of the same order as in the equivalent estimation problem on compact 
sets of M. Furthermore, our estimate achieves this rate up to a logarithmic term. When p > 2, 
using our maxiset result, we prove that this last result concerning our procedure is no more true. 
But we prove under mild conditions that the rate of / 7 is not larger than (log(n)/rj) a ' ( 2+2a-1 / p ) 
up to a constant. Note that when p = 00, (log(n) /n) a ^ 2+2a ^ is the rate pointed out by Juditsky 



and Lambert-Lacroix [24j] for minimax estimation under the L2-I0SS on the space B%^ OQ . Of course, 
when compactly supported functions are considered, / 7 is adaptive minimax on Besov spaces Bp q 
up to a logarithmic term. 

The second goal of the paper is to discuss the choice of the threshold. The starting point 
of this discussion is as follows. The main term of the threshold is (27log(n)VA,n) 1//2 where V\ }U 
is an estimate of the variance of (3\ and 7 is a constant to be calibrated (see (|2.2p for further 
details). As usual, 7 has to be large enough to obtain the theoretical results (see Theorem [l]). 
Such an assumption is very classical (see for instance 24], 17], [13] or [J]). But, as illustrated by 



Juditsky and Lambert-Lacroix [24|, it is often too conservative for practical issues. In this paper, 
the assumption on the constant 7 is as less conservative as possible and actually most of the results 
are valid if 7 > 1. So, the first issue is the following: what happens if 7 < 11 Theorem [5] of Section 
M proves that the rate obtained for estimating the simple function l\oi] is larger than n~^ 7+£ ^ 2 for 
any e > 0. This proves that 7 < 1 is a bad choice since, with 7 > 1, we achieve the parametric 
rate up to a logarithmic term. Finally we consider a special class of intensity functions denoted T n - 
Theorems [9] and [10] provide upper and lower bounds of the maximal ratio on T n of the risk of / 7 
by the oracle risk and prove that 7 should not be too large. 

Finally we validate the previous range of 7 and refine it through a simulation study so that 
one can claim that 7 = 1 is a fairly good choice for all the encountered situations (finite/infinite 
support, bounded/unbounded intensity, smooth/non-smooth functions). 
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1.4 Outlines 

The paper is organized as follows. In Section [21 the main result of this paper is established. Then, 
Section [3] introduces biorthogonal wavelet bases that are used to give oracle, maxiset and minimax 
results pointed out in Section [H Section [5] discusses the choice of the threshold, whereas Section [6] 
provides some simulations. Finally, Section [7] gives the proof of the theoretical results. 



2 The main result 



In the sequel, for R > 0, if T is a given Banach space, we denote T{K) the ball of radius R associated 
with T . For any 1 < p < oo, we denote 



\g(x)\ p dx 



\\9\\p = 

with the usual modification for p = oo. To state the main result, let us introduce the following 
notations that are used throughout the paper. We set 



V A G A, V: 



A,n 



R n 



-dN x , 



the natural estimate of V\ >n that is the variance of (3\\ 



V A G A, V X ,n = E(Vx, n ) = —, 



11 



where 



V A G A, o\= \ ip 2 x (x)f(x)dx. 
Theorem 1. We assume that |X IP is true and T n is such that for A G T n , 

and that for all iGi, 

card{\ G T n : <p\{x) / 0} < m^logra, 
where c^^ and depend on n and on the family (f\)\eA- Let 7 > 1. We set 

fT~ 1 ^logn 

^,7 = \/ 2 7 lo g n Kx,n + „ I^aIoo, 



3n 



(2.1) 
(2.2) 



where 



V x , n = V x>n + \ ^\ognV x 



W\\oo 



+ 37logn- 



ane? consider f nrj defined in il.ty) . Then for all £ < 7 — 1 and /or allp > 2 ane? g s«c/i £/iai - + - = 1 



ana 1 ^ > 1 + e, 



:E(l/„, 7 - /ll|) < E 



I A^m Aem Asm 



+ 



2 + e 

+ c {l + £)p 2 \\f\\ 1 c 2 ^ n m^ n log{n) [n~^ + n~?(max(||/||i; l))i 
where cq is an absolute constant. 
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Note that this result is proved under very mild conditions on the decomposition of /. In partic- 
ular we never use in the proof that we are working on the real line but only that the decomposition 
(jl.ip exists. Observe also that if we use wavelet bases (see (|3.ip in Section [3] below where we recall 
the standard wavelet setting) and if 

T n c{\ = (j,k)eA: 2^<n c }, 

where c is a constant, then m^n does not depend on n and in addition, 



sup 

n 



-(c-l)/2 



< OO. 



The threshold seems to be defined in a rather complicated manner. But, first observe that V# > 0, 

VAer n , 



^log(n) / ~ 'ylog(Ti) 

2jlog(n)V x , n H g- — II^aIIoo < ?7A, 7 < c lt e'\j2jlog(n)Vx,n + C2,e — ^ — 



(2.3) 



with ci fi = J 1 + ig, c 2 ,e = + 6 + l) 



Since V\ „ is the natural estimate of V\ „ , the first term of the left hand side of (12.311 is similar to 



the threshold introduced by Juditsky and Lambert-Lacroix |24| in the density estimation setting. 
But unlike Juditsky and Lambert-Lacroix [2J], we add a deterministic term that allows to consider 
7 close to 1 and to control large deviations terms. In addition, since ry^ 7 cannot be equal to 0, 
this allows to deal with very irregular functions. However, observe that, most of the time, the 
deterministic term is negligible compared to the first term as soon as A £ T n satisfies ||</?a||oo — 
Finally, in the same spirit, V\ :n is slightly overestimated and we consider V\ n instead of 



On n 



1/2 > 



V\, n to define the threshold. 

The result of Theorem Q] is an oracle type inequality. By exchanging the expectation and the 
infimum, the result provides the expected oracle inequality claimed in Theorem [2] of Section 14.11 
Theorem [2] is derived from Theorem [1] by evaluating E(^ Ae r n vft 7 ) and by using biorthogonal 
wavelet bases. 



3 Biorthogonal wavelet bases and Besov spaces 

In this paper, the intensity / to be estimated is assumed to belong to Li nL2. In this case, / can be 
decomposed on the Haar wavelet basis and this property is used throughout this paper. However, 
in Section 14.31 the Haar basis that suffers from lack of regularity is not considered. Instead, we 
consider a particular class of biorthogonal wavelet bases that are described now. For this purpose, 
let us set 

= l[o,i]- 

For any r > 0, there exist three functions <f> and ip with the following properties: 

1. 4> and ip are compactly supported, 

2. 4> and if) belong to C r+1 , where C r+1 denotes the Holder space of order r + 1, 

3. ip is compactly supported and is a piecewise constant function, 

4. ip is orthogonal to polynomials of degree no larger than r, 
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5. {(0fc,^,fc)j>o,fcez, (<t>k,i>j,k)j>o,kez} is a biorthogonal family: V j,/ > 0, V fe, fc' G Z, 

/ ipj tk (x)<f> k >(x)dx = / 4> k {x)ijjj^k'(x)dx = 0, 

/ 4> k {x)cj) k ,(x)dx = l k =k', / ipj,k(x)i>j',k'(x)dx = l j=j , )k = k ,, 
Jr Jr 

where for any i£l and for any (j, fc) G Z 2 , 

fc (x) = 0(x - k), ^ j>k {x) = 2i^(Vx - k) 

and 

4> k (x) = 4>{x - k), $ j>k (x) = 2^ 2 ^x - k). 
This implies that for any / G Li D L2, for any 

/(«) = E"*^^) +EE^ fe ^>' ife ( a: )' 

where for any j > and any G Z, 

a k = f(x)4> k (x)dx, f3j tk = / f(x)ip j)k (x)dx. 



Such biorthogonal wavelet bases have been built by Cohen a/. 1 1 it] as a special case of spline 
systems (see also the elegant equivalent construction of Donoho [15|| from boxcar functions). Of 
course, recall that all these properties except the second and the forth ones are true for the Haar 
basis, where <j> = <j> and tp = ip = 1 [0,1/2] ~~ l]l/2,l]j which allows to obtain in addition an orthonormal 
basis. This last point is not true for general biorthogonal wavelet bases but we have the frame 
property: there exist two constants c\ and C2 only depending on the basis such that 

^ ( E «*- + EE eh) * 11/111 <^(E^+EE & 

\k& j>0 fceZ J \ke1 j>0 k&L 

In the sequel, when wavelet bases are used, we set 

A = {X = {j,k): j>-l,keZ}. (3.1) 

We denote for any A G A, (fx = 4>k (respectively (p\ = <f> k ) if A = (—1, k) and tp\ = ijjj tk (respectively 
(f>\ = ipj,k) if A = (j, k) with j > 0. Similarly, (3\ = a k if A = (-1, k) and j3\ = (3 jtk if A = (j, k) with 
j > 0. So, (11. lj) is valid. An important feature of the bases introduced previously is the following: 
there exists a constant [ty > such that 

inf \<P(x)\ > 1, inf \il>(x)\ > H , (3.2) 

a:£[0,l] x£supp(ifi) 

where supp(ip) = {x G E : ip(x) 7^ 0}. This property is used throughout the paper. 

Now, let us recall some properties of Besov spaces that are extensively used in the next section. We 



refer the reader to [13j and [20(] for the definition of Besov spaces, denoted Bp q in the sequel, and 
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a review of their properties explaining their important role in approximation theory and statistics. 
We just recall the sequential characterization of Besov spaces by using the biorthogonal wavelet 
basis (for further details, see [12]). Let 1 < p, q < oo and < a < r + 1, the Bp q -norm of / is 
equivalent to the norm 



\otk)k k, + 



\\ot,p,q 



T,,>o^ Ma+1/2 ^ M \m,k) k \\ 



if q < oo, 

(a k )kh p + su Pi > 2^ a + 1 / 2 - 1 M\\(P lk ) k \\ ep ' if q = oo. 



We use this norm to define the radius of Besov balls. For any R > 0, if < a' < a < r + 1, 
1 < p < p' < oo and 1 < q < q' < oo, we obviously have 

B« q (R)cB« ql (R), Bp q (R) c Bp' q (R). 

Moreover 

C B$ >q (R) if a - - > a' - ^ (3.3) 

The class of Besov spaces £>^ )OQ provides a useful tool to classify wavelet decomposed signals in 
function of their regularity and sparsity properties (see (23|). Roughly speaking, regularity increases 
when a increases whereas sparsity increases when p decreases. Especially, the spaces with indices 
p < 2 are of particular interest since they describe very wide classes of inhomogeneous but sparse 
functions (i.e. with a few number of significant coefficients). The case p > 2 is typical of dense 
functions. 

4 Oracle, maxiset and minimax results 

Along this section, we use biorthogonal wavelet bases as defined in Section [3l 
4.1 Oracle inequalities 



Ideal adaptation is studied in jl_6[] using the class of shrinkage rules in the context of wavelet 
function estimation. This is the performance that can be achieved with the aid of an oracle. In our 
setting, the oracle does not tell us the true function, but tells us, for our thresholding method, the 
coefficients that have to be kept. This "estimator" obtained with the aid of an oracle is not a true 
estimator, of course, since it depends on /. But it represents an ideal for a particular estimation 
method. The approach of ideal adaptation is to derive true estimators which can essentially "mimic" 
the performance of the oracle estimator. So, using the interpretation of thresholding rules as model 
selection rules, the oracle provides the model m C T n such that the quadratic risk of fm is minimum. 
Since, we have for any m C T ra , 

E(||/ m -/|||)=E^,n+E^ 

Asm Agm 

the oracle estimator / TO is obtained by taking m = {A E T n : P\> Kx,n} and 

Aer n 

Its risk (the oracle risk) is then 

nwu - /ii|) = e E ^hi>v Kn - + E & = E ™HPlv K n) + E 

Aer n A^r„ Aer„ x^r„ 
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Our aim is now to compare the risk of f na to the oracle risk. We deduce from Theorem [T] the 
following result. 

Theorem 2. Let us fix two constants c > 1 and d G M., and let us define for any n, jo = jo(n) the 
integer such that 2- J0 < n c (log(n)) c < 2 30+ . Let 7 > c and let rj\ n be as in Theorem^ Then f nn 
defined with 

Tn = {A = (j, k) G A : j < j } 
achieves the following oracle inequality: 

E(|/ n>7 -/|||)<Ci( 7 ,v) 

where Ci(j,ip) is a positive constant depending only on the basis and of the value of 7 and where 
^2(73 1| /111; c > c'i V 3 ) * s a ^ so a positive constant depending on 7 ane? £/ie frasis &m£ a/so on ||/||i, c and c'. 

The oracle inequality (14. 1|) satisfied by / n>7 proves that this estimator achieves essentially the 
oracle risk up to a logarithmic term. This logarithmic term is the price we pay for adaptivity, i.e. 
for not knowing the wavelet coefficients that have to be kept. In section El optimization of the 
constants of the stated result is performed for a particular class of functions. 



AelY, 



min(/3|, Vx )n log(n)) + 



01 



| C 2 ( 7 , [j/jji, c,c',y) 



■n 



4.2 Maxiset results 

As said in the introduction, if /* is a given procedure, the maxiset study of /* consists in deciding 
the accuracy of the estimate by fixing a prescribed rate p* and in pointing out all the functions 
/ such that / can be estimated by the procedure /* at the target rate p* . The maxiset of the 
procedure /* for this rate p* is the set of all these functions. So, we set the following definition. 

Definition 2. Let p* = (p^)n be a decreasing sequence of positive real numbers and let f* = (fn)n 
be an estimation procedure. The maxiset of f* associated with the rate p* and the IL2-I0SS is 

M5(r,p*) = |/eL 1 nL 2 : sup [(p* n )~ 2 E\\f: - /|||] < +00 j , 

the ball of radius R > of the maxiset is defined by 

MS(r,p*)(R) = {/ G U n L 2 : sup [(p* n )- 2 E\\r n - /|||] < i? 2 j . 

To establish the maxiset result of this section, we use Theorem [2J so we need to assume that 
the estimation procedure is performed in a ball of Li D L 2 . Even, if the size of the balls does not 
play an important role, this assumption is essential. In this setting, we use the following notation. 
If T is a given space 

MS{f\p*) := T 
means in the sequel that for any R > 0, there exists R' > such that 



MS(f*,p*)(R) n La (12) n h 2 (R) c f{r!) n LaCR) n L 2 (.R) 
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and for any R' > 0, there exists R > such that 

T(R') n hi(R') n L 2 (i?') c MS(f*, P *){R) n Li(iT) n h 2 (R'). 

In this section, for any a > 0, we investigate the set of functions that can be estimated by / 7 
(/n, 7 )n at the rate p a = (p n ,a)n, where for any n, 



Pn,c 



log(n) 



n 



More precisely, we investigate for any radius R > 0: 

M5(/ 7 , p a )(i?) = {/ G L x n L 2 : sup k 2 a E||/ n , 7 - /|| 



< R' 



To characterize maxisets of / 7 , we introduce the following spaces. 
Definition 3. We define for all R> and for all s > 0, 



|/ = y>A^A: supt^V^l 



\P\\<°\t 



< oo 



i/ie ball of radius R associated with W s is: 



W S {R) = \f = V/?a^a : supt^ V/3fl| A |< ffAt < 
I AeA * >0 AeA 



and for any sequence of spaces T = (T n ) n included in A, 



B. 



2.1" 



\ip\ : sup 



AeA 



log(n) 



n 



-2s 



Aerv 



and 



B 2 ' r (fl) = { 



AeA 



A^A : sup 

n 



log(n) 



n 



-2s 



< oo 



< i? 2 



Agr r: 



In [lj], a justification of the form of the radius of W s and further details are provided. These 
spaces can be viewed as weak versions of classical Besov spaces, hence they are denoted in the sequel 
weak Besov spaces. In particular, the spaces W s naturally model sparse signals (see \E^)- Note 
that if for all n, 

r n = {A = (j, k) G A : j< j } 



with 



2 J0 < 



n 



logn 



< 2 J0 



+1 



c> 



then, U| r is the classical Besov space B s 2 if some properties of regularity and vanishing moments 
are satisfied by the wavelet basis (see Section [3]) . We define -B| r and W s by using biorthogonal 
wavelet bases. However, as established in [Til], they also have different definitions proving that, 
under mild conditions, this dependence on the basis is not crucial at all. Using Theorem [2] we have 
the following result. 
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Theorem 3. Let us fix two constants c > 1 and d S M., and let us define for any n, jo = jo(n) the 
integer such that 2-?° < n c (log(ra)) c < 2- J0+1 . Let 7 > c and let rj\ n be as in Theorem^ Then, the 
procedure defined in ll.fy) with the sequence T = (T n ) n such that 

r n = {A = (j, k) £ A : j < j } 

achieves the following maxiset performance: for all a > 0, 

MS(f y ,p a ) :=Bff^nW a . 

In particular, if d = — c and < ^j^a) < r + 1; where r is the parameter of the biorthogonal basis 
introduced in Section\B, 



Remark 1. In order to obtain maxisets as large as possible, Inequality 117. 7|) of the proof of Theorem 
O suggests to choose 7 > 1 as small as possible. 

The maxiset of / 7 is characterized by two spaces: a weak Besov space that is directly connected 
to the thresholding nature of f 1 and the space B^p 1+2a ' that handles the coefficients that are not 
estimated, which corresponds to the indices j > jo. This maxiset result is similar to the result 
obtained by Autin [H] in the density estimation setting but our assumptions are less restrictive (see 
Theorem 5.1 of Q). 

Now, let us point out a family of examples of functions that illustrates the previous result. For this 
purpose, we consider the Haar basis that allows to have simple formula for the wavelet coefficients. 
Let us consider for any < /3 < 1/2, fp such that 

VxGM, f p {x) =x~ l3 l xmi] . 

The following result points out that if a is small enough, for a convenient choice of f3, fp belongs 
to MS(f-y,p a ) (so fp can be estimated at the rate p a ), and in addition fp ^ L^. 

Proposition 1. We consider the Haar basis and we set d = —c. For < a < 1/4, under the 
assumptions of Theorem^ if 

n l-4a 

then for c large enough, 



f €MS(f 7 ,p a ) :=B 2 r'nW a 



where 2a) and W a are viewed as sequence spaces. In addition, fp ^ L^. 

This result is proved by using the Haar basis, so the functional spaces are viewed as sequence 
spaces. We conjecture that for more general biorthogonal wavelet bases, we can also build not 
bounded functions that belong to M5(/ 7 , p a ). 

4.3 Minimax results 

Let T be a functional space and T{K) be the ball of radius R associated with T . is assumed 

to belong to a ball of Li HL2. Let us recall that a procedure /* = (/^) n achieves the rate p* = (p* n ) n 
on F{R) (for the L2-I0SS) if 



sup 



(p* n )~ 2 sup E(||/*-/|||) 
feHR) 



< 00. 
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Let us consider the procedure / 7 and the rate p a = (p n ,a)n where for any n, 

flog(n) \ i^fe 
Pn > a= {—) 

as in the previous section. Obviously, / 7 achieves the rate p a on ^(R) if and only if there exists 
R' > such that 

F{R) c MSif^, Pa )(R') nU(R') nh 2 (R'). 

Using results of the previous section, if d = — c and if properties of regularity and vanishing moments 
are satisfied by the wavelet basis, this is satisfied if and only if there exists R" > such that 



F{R) C B; ( £ 2a) {R") n W a (R") n U(R") n L 2 (R"). 

We apply this simple rule for Besov balls. So, in the sequel, we assume that the function / to be 
estimated belongs to a ball of Li n h 2 . In addition, we assume that / also belongs to a ball of L^. 
This last assumption which is not necessary to derive maxiset results (see Theorem [3] or Proposition 
[T]) is unavoidable in some sense in the minimax setting. For a precise justification of this point, see 
for instance Corollary 1 of jig]. Consequently, in the sequel, we set for any R > 0, 

£i, 2) oo(#) = {/ : ll/lli < R, Wfh < R, ll/lloc < R} ■ 

In the sequel, minimax results depend on the parameter r of the biorthogonal basis introduced in 
Section [3] to measure the regularity of the reconstruction wavelets (0,V>). 



4.3.1 Minimax estimation on Besov spaces Bp q when p < 2 



To the best of our knowledge, the minimax rate is unknown for B^ q when p < oo. Let us investigate 
this problem by pointing out the minimax properties of / 7 on B^ q when p < 2. We have the following 
result. 

Theorem 4. Let R,R'>0, 1 < p, q < oo and a£l such that max(0, 1/p — 1/2) < a < r + 1. Let 
c > 1 large enough such that 

all-—. — r )>---■ (4.2) 

V c(l + 2a)J-p 2 K ' 

Let us define for any n, jo = jo( n ) the integer such that 

2 j0 < n c (log(n))~ c < 2 j0+1 . 
Then, if p < 2, / 7 = (/n i7 )n defined with 

T n = {A = (j, k) G A : j < j } 
and 7 > c achieves the rate p Q on Bp q (R) n £1,2,00 (R')- Indeed, for any n, 

/ 1 \2a/(l+2a) 

sup E(||/ n , 7 - /|||) < C( 7 , c, R, R', a,p, V )(^) (4.3) 
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where C(7, c, R, R' , a,p, ip) depends on R' , 7, c, on the parameters of the Besov ball and on the 
basis. 

Furthermore, let p* > 1 and a* > such that 

a* (l- - — c J > — --. (4.4) 

V c(1 + 2q*) J ~ p* 2 y ' 

Then, / 7 is adaptive minimax up to a logarithmic term on 

{B^ q n £i,2 i00 : a* < a < r + 1, p* < p < 2, 1 < q < 00} . 

This result points out the minimax rate associated with Bp q {R) n £1,2,00 (R 1 ) up to a logarithmic 
term and in addition proves that it is of the same order as in the equivalent estimation problem on 
[0,1] (see [U). 

It means that, roughly speaking, it is not harder to estimate sparse non-compactly 
supported functions than sparse compactly supported functions from the minimax point of view. 
In addition, the procedure / 7 does the job up to a logarithmic term. When p > 2 (i.e., when dense 
functions are considered), this conclusion does not remain true. 



4.3.2 Minimax estimation on Besov spaces Bp q when p > 2 

Before considering the case of estimation of non-compactly supported functions, let us establish the 
following result. We denote K. the set of compact sets of IR containing a non-empty interval. We 
define for K £ /C, Bp K (R) the set of functions supported by K and belonging to Bp q (R). 

Corollary 1. We assume that assumptions of Theorem^ are true. For any p > 1, / 7 achieves the 
rate p a on B* q K {R) n £1,2,00 {R') ■ 

Furthermore, / 7 is adaptive minimax up to a logarithmic term on 

{B^ q K n £1,2,00 : a* < a < r + 1, p* < p < 00, 1 < q < 00, K € K,} , 
where a* and p* satisfy j4-4\ )- 

To prove this corollary, it is enough to apply Theorem[4]and to note that ^-(-R) C Bp ^ K {R) C 
£>2 00 k(R) f° r R large enough when p > 2. 

When non-compactly supported functions are considered, this result is not true and we can 
prove the following theorem. 

Theorem 5. Let p > 2 and a > 0. There exists a positive function f such that 

f G Li n L 2 n Loo n Bp yOQ and f £ W a , 

where the function spaces are viewed as sequential spaces (the Haar basis is used). 

Remark 2. This result is established by using the Haar basis. We conjecture that it remains true 
for more general biorthogonal wavelet bases. 

This result proves that / 7 does not achieve the rate p a on Bp tOC when p > 2, showing that 
minimax statements of Section 14.3.11 are not valid in this setting. As said previously, it seems to us 
that minimax rates and adaptive minimax rates are unknown for Bp^, when 2 < p < 00 even if 
Donoho et al. [17|| provided some lower bounds in the density framework. For the case p = 00, see 



24]. 



Now, let us investigate the rate achieved by / 7 on Bp q {R) when p > 2. 
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Theorem 6. Let R, R' > 0, 1 < q < oo, 2 < p < oo and a G M smc/i i/iai l/(2p) < a < r + 1. Lei 
?<s define for any n, jo = jo(n) the integer such that 

2 ja < n c ()ogn)- c < 2 jo+ \ 

with c > l.Then, / 7 = (/ n , 7 ) n defined with 

r n = {A = (j, k) G A : j < j } 

and 7 > c achieves the following performance. For any n, 

" / 

where C(7, c, i?, i?', a,p, (£>) depends on R' , 7, c, c', on i/ie parameters of the Besov ball and on the 
basis. 



Note that when p = 00, the risk is bounded by ^^pj 1+ ° up to a constant, which is the rate 
of the minimax risk on B^ ^(R) up to a logarithmic term in the density estimation setting (see 

a 

Theorem 1 of 0). However, ^ ^| and (^p)"^ » pl,a- So, / 7 is probably not 

adaptive minimax on the whole class of Besov spaces. However, we establish that our procedure is 
adaptive minimax (with the exact power of the logarithmic factor) over weak Besov spaces without 
any support assumption. 



4.3.3 Minimax estimation on W a and adaptation with respect to a 

a 

We investigate in this section a lower bound for the minimax risk on W a (R)r\B 2 1+ ^ a (R')nCi t 2,oo(R") 
for R, R\ R" > viewed as sequence spaces for the Haar basis and we set 

n(W a (R)nB^(R')n£ 1>2 , 00 (R")) = M sup E(|/-/l|). 

f feW a (R)nBj^ (R')nCi, 2 MR") 

Theorem 7. For a > 0, we have 

liminf p- 2 a TZ(W a (R) n B^(R') n & Aoo (R")) > c(a)R^, 

where c(a) depends only on a, as soon as R" > 1 and R' > R x + 2a > 1. 

Using Theorem [U we immediately deduce the following result. 
Corollary 2. The procedure / 7 defined in Theorem^ with c = —d = 1 and with 7 > 1 is minimax 

a 

on W Q (R) n B^^* (R') H Ci,2,oo(R") and is adaptive minimax on 

\w a {R) n B^(R') n C 1X oo(R") ■ a > 0, 1 < R", 1 < R < #} . 

Remark 3. These results are established for the Haar basis. It is probably true for more general 
biorthogonal wavelet bases, but we were not able to prove it. 
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5 How to choose the parameter 7 

In this section, our goal is to find lower and upper bounds for the parameter 7. The aim and proofs 
are inspired by Birge and Massart [3] who considered penalized estimators and calibrated constants 
for penalties in a Gaussian regression framework. In particular, they showed that if the penalty 
constant is smaller than 1, then the penalized estimator behaves in a quite unsatisfactory way. This 
study was used in practice to derive adequate data-driven penalties by Lebarbier (3o| . 

We assume that the function / to be estimated belongs to a restricted functional space. More 
precisely, we assume that for n large enough, / belongs to T n where for any n, 

^„ = {/€L 1 nL a nL ft0 : F x > ^y°^ l Fx>0 , VA £ a}, 

with Fx = fguppfy,^ f(x)dx. Observe that T n only contains functions with finite support. If the Haar 
basis is considered, any function supported by [0, 1] that is constant on each interval of a dyadic 
partition of [0, 1] belongs to T n for n large enough. In addition, the interest of the class T n lies in 
the natural bridge it constitutes between the model of this paper and the regression model for which 
the number of non-zero coefficients is always bounded by n. These reasons justify the importance of 
well estimating functions of T n with an appropriate choice for 7. We naturally consider along this 
section the Haar basis and we define for any n, jo = jo{n) the integer such that 2 J0 < n < 2 J0+1 . 
Then / n>7 is defined with 

Tn = {A = (J, k) G A : j< j } . 

In the sequel, we prove that, roughly speaking, / n>7 cannot achieve good performance from the 
oracle point of view if the parameter 7 is smaller than 1 or larger than 16. 

5.1 Lower bound for 7 

In this section, we provide a lower bound for the parameter 7. We have the following result. 
Theorem 8. We estimate f = ho,i] £ Fn with /„ i7 such that in view of $2.3\) . we set 

I, log(n)n w 

00 ) 
n 



VAer n , r/ Aj7 = v/27log(n)Vx,n + Iva 



with (u n ) n a deterministic bounded sequence. Then for all e > 0, we obtain for any n, 

E(|/n, 7 -/||)>-^(l+On(l)). 

This result shows that we need 7 > 1 to obtain a good convergence rate. Indeed, for any n, 
Theorem [2] (established with 7 > 1) gives the bound 

E(||/n l7 -/|||)<C^, 



where C is a constant. 
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5.2 Upper bound for 7 

In this section, we provide an upper bound for the parameter 7. In Remark [TJ we have already 
noticed that the performances of / 7 are worse when 7 increases. More justifications of this point 
are provided in this section. 

Theorem 9. Let 7 = 1 + y/2 and let r]\ n be as in Theorem d Then / n/Y achieves the following 
oracle inequality: for n large enough, 

E(l/n,7-/l|) 

sup — — 5 — y — 121ogn. 



Tl 



Now, let us assume that for a choice of 7, say 7 m i n , the corresponding threshold f/A,7 min leads to 
satisfying results (for instance, Theorem [9] tells us that 7 = 1 + \/2 is a good choice). Then let us 
fix 7 larger than 7 m ; n and let us consider the estimator f na associated with the threshold rj\ n as 
built in Theorem [JJ Our goal is to obtain a lower bound of the maximal risk of / n>7 on T n larger 
than the upper bound obtained for ??A,7 min - This means that choosing 7 is a bad choice. This goal 
is reached in the following theorem. 

Theorem 10. Let 7 m i n > 1 be fixed and let 7 > 7 m i n . We still consider the thresholding rule 
associated with 7 (see Theorem^. Then, 

E(||/n l7 -/|||) ^ , _ , n , 

sup ^ . ,ra T / s , 1 > (v 7 ^ - \/7m^) 21ogn(l + o„(l)). 

If we choose 7 m i n = 1 + y/2 and apply Theorem (H the maximal oracle ratio of the estimator /„ i7 
is not larger than 121ogn. So, if 7 > 16, which yields (y^y — ^/7 m in) 2 > 6, the resulting maximal 
oracle ratio of f nn is larger than 121ogre. In addition, note that the function used in Theorem [8] is 
also in T n - So, finally the convenient value of 7 belongs to [1, 16]. 



6 Simulations 

In this section, some simulations are provided and the performances of the thresholding rule are 
measured from the numerical point of view. We also discuss the ideal choice for the parameter 7 
keeping in mind that the value 7 = 1 constitutes a border for the theoretical results (see Section 
[5]). For these purposes, the procedure is performed for estimating various intensity signals and the 
wavelet set-up associated with biorthogonal wavelet bases is considered. More precisely, we focus 
either on the Haar basis where 

(/) = $= 1 [01] , V = 4> = l[o,i/2] - !] 1/2,1] 

or on a special case of spline systems given in Figure [TJ This latter basis, called hereafter the spline 
basis, has the following properties. First, the support of (j>, ip, <p an d ip is included in [—4,5]. The 
reconstruction wavelets <f> and ip belong to C 1,272 . Finally, the wavelet ip is a piecewise constant 
function orthogonal to polynomials of degree 4 (see [IB])- So, such a basis has properties 1-5 
required in Section [3] with m = 0.272. Then, the signal / to be estimated is decomposed as follows: 



AeA kez j>o kez 
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Figure 1: The spline basis. Top: 4> and ip ) Bottom: <fi and ip 

For estimating /, we use the observations (/3a)asA associated with a Poisson process N whose 
intensity with respect to the Lebesgue measure is n x /. Since <\> and ip are piecewise constant 
functions, accurate values of the observations are available, which allows to avoid many computa- 
tional and approximation issues that often arise in the wavelet setting. To shed light on typical 
aspects of Poisson intensity estimation, Figure [2] displays the reconstruction obtained by using only 
the coarsest noisy wavelet coefficients of a particular signal (the density of a Gaussian variable with 
mean 0.5 and standard deviation 0.25) with n = 4096. We mean that (Pj,k)j>-i,ke% is estimated 
by (/%,fe)-i<j<io,fceZ without using thresholding. As expected, variability highly depends on the 
local values of the signal. So, our framework is very different from classical regression where we 
observe random variables with common variance. The thresholding rule considered in this section 
is / 7 = (/ n , 7 )n with / n)7 defined in (|1.2p with 

T n = {X = (j,k) : — 1 < j < jo, k£Z} 

and 

Observe that t/a i7 slightly differs from the threshold defined in (12. 2|) since V\,n i s now replaced with 
V\ in . Such a modification is natural in view of (|2.3p and Theorem [8l In particular, it allows to 
derive the parameter 7 as an explicit function of the threshold. We guess that the performances 
of our thresholding rule associated with the threshold rj\ n defined in (12. 2|) are very close. Now, to 
complete the definition of the estimate, we have to choose the parameters jo and 7. This choice is 
capital and is extensively discussed in the sequel. Using n = 1024, Figure [3] displays 9 examples 
of intensity reconstructions obtained with jo = log 2 (n) = 10 and 7 = 1. These functions are 
respectively denoted 'Haarl', 'Haar2', 'Blocks', 'Comb', 'Gaussl', 'Gauss2', 'Beta0.5', 'Beta4' and 
'Bumps' and have been chosen to represent the wide variety of signals arising in signal processing 
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Figure 2: Plots of the signal f(x) = - exp ( ^xosh ) an< ^ P urer y noisy reconstruction with 

n = 4096 based on the wavelet coefficients until the level 10 and by using the Haar basis. 

(see the Appendix for a precise definition of each signal). Each of them satisfies ||/||i = 1 and can be 
classified according to the following criteria: the smoothness, the size of the support (finite/infinite), 
the value of the sup norm (finite/infinite) and the shape (to be piecewise constant or a mixture of 
peaks). In particular, the signal 'Comb' (respectively 'Beta0.5') is inspired by the construction of 
the counter-example proposed in Theorem [5] (respectively Proposition [T]). 

More interestingly, numerical results are provided to answer the question about the choice of 7. 
Given n and a function /, we denote R n {l) the ratio between the ^-performance of our procedure 
(depending on 7) and the oracle risk where the wavelet coefficients at levels j > jo are omitted. We 
have: 

E\er n ™MPl,V\,n) X; Aern min(/3j[,y^ n ) 

Of course, R n is a stepwise function and the change points of R n correspond to the values of 7 such 
that there exists A with r/A, 7 = \$x\- The average over 1000 simulations of R n (l) is computed provid- 
ing an estimation of E(i2 n (7)). This average ratio, denoted R n {l) and viewed as a function of 7, is 
plotted for three signals 'Haarl', 'Gaussl' and 'Bumps' for n G {64, 128, 256, 512, 1024, 2048, 4096}. 
For non compactly supported signals, to compute the ratio, the wavelet coefficients associated with 
the tails of the signals are omitted but we ensure that this approximation is negligible with respect 
to the values of R n . The parameter jo takes the value jo = log 2 (n). Fixing jo = log 2 (n) is natural 
in view of Theorem [2] (applied with c = 1 and d = 0) and Theorem Figure H] displays R n for 
'Haarl' decomposed on the Haar basis. The left side of Figure |4] gives a general idea of the shape 
of R n , while the right side focuses on small values of 7. Similarly, Figures [5] and [6] display R n for 
'Gaussl' decomposed on the spline basis and for 'Bumps' decomposed on the Haar and the spline 
bases. 
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Figure 4: The function 7 — ► R n {l) at two scales for 'Haarl' decomposed on the Haar basis and for 
n G {64,128,256,512,1024,2048,4096} with j = log 2 (n). 
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Figure 5: The function 7 — ► R n {n) for 'Gaussl' decomposed on the spline basis and for n G 
{64, 128, 256, 512, 1024, 2048, 4096} with j = log 2 (n). 
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Figure 6: The function 7 — > Rn{l) for 'Bumps' decomposed on the Haar and the spline bases and 
for n G {64, 128, 256, 512, 1024, 2048, 4096} with j = log 2 (n). 

To discuss our results, we introduce 

7min(n) = argmin 7>0 i? n (7). 

For 'Haarl', 7 m i n (n) > 1 for any value of n and taking 7 < 1 deteriorates the performances of 
the estimate. Such a result was established from the theoretical point of view in Theorem [8l In 
fact, Figure H] allows to draw the following major conclusion for 'Haarl': 

7^(7) « i4( 7 min) ~ 1 (6.1) 

for a wide range of 7 around 7 m j n > 1 that contains 7 = 1. For instance, when n = 4096, the 
minimum of R n , close to 1, is very flat and the minimizer is surrounded by the "plateau" [1, 177]. 
So, the values of 7 m i n (n) should not be considered as sacred. Our thresholding rule with 7 = 1 
performs very well since it achieves the same performance as the oracle estimator. 

For 'Gaussl', 7 m in(^) > 0.5 for any value of n. Moreover, as soon as n is large enough, the 
oracle ratio at 7 m i n is of order 1. Besides, when n > 2048, as for 'Haarl', 7 m i n (n) is larger than 1. 
We observe the "plateau phenomenon" as well and as for 'Haarl', the size of the plateau increases 
when n increases. This can be explained by the following important property of 'Gaussl'. 'Gaussl' 
can be well approximated by a finite combination of the atoms of the spline basis. So, we have the 
strong impression that the asymptotic result of Theorem [8] could be generalized for the spline basis 
as soon as we can build positive signals decomposed on the spline basis. 

Conclusions for 'Bumps' are very different. Remark that this irregular signal has many significant 
wavelet coefficients at high resolution levels whatever the basis. We have 7 m m(n) < 0.5 for each 
value of n. Besides, 7min(i) ~ when n < 256, meaning that all the coefficients until j = jo have to 
be kept to obtain the best estimate. So, the parameter jo plays an essential role and has to be well 
calibrated to ensure that there are no non-negligible wavelet coefficients for j > jo- Other differences 
between Figure [U (or Figure [5]) and Figure [6] have to be emphasized. For 'Bumps', when n > 512, 
the minimum of R n is well localized, there is no plateau anymore and J? n (l) > 2 (i?n(7min(w)) is 
larger than 1). 

As a preliminary conclusion, it seems that the ideal choice of 7 and the performance of the 
thresholding rule highly depend on the decomposition of the signal on the wavelet basis. Hence, in 
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Figure 7: Average over 100 iterations of the function R n for signals decomposed on the Haar basis 
and for n G {64, 256, 1024, 4096} with j = 10. 

the sequel, we have decided to force jo = 10 so that the decomposition on the basis is not too rough. 
To extend previous results and for the sake of exhaustiveness Figures [7] and [8] display the average 
of the function R n for the signals 'Haarl', 'Haar2', 'Blocks', 'Comb', 'Gaussl', 'Gauss2', 'Beta0.5', 
'Beta4' and 'Bumps' with jo = 10. For brevity, we only consider the values n G {64, 256, 1024, 4096} 
and the average of R n is performed over 100 simulations. Note also that we fix jo = 10 and 100 
simulations (and not larger parameters) because computational difficulties arise when we deal with 
infinite support for heavy-tailed signals ('Beta4' and 'Comb') and for a wide range of 7. Figured 
gives the results obtained for the Haar basis and Figure [8] for the spline basis. To interpret the 
results, we introduce 

.log, v = ExerSPx-Px) 2 = gWgxjto^ ~ Px? 
n K1) Exer n min(/3|, V\ >n log(n)) £ A6r?i min($, V x , n log(n)) ' 

where the denominator appears in the upper bound of Theorem [2l We also measure the I2- 
performance of the estimator by using 

r»(7) = E fa - &) a = E ^^i^ - M 2 - 

Aer n Aer n 

Table [T] gives, for each signal and for n G {64,256,2048,4096}, the average of r n (l), denoted f^(l), 

the average of Rn(l) , denoted R n (l) and the average of Rn g (l), denoted Rn g (l) (100 simulations 
are performed). In view of Table Q], let us introduce two classes of functions. The first class is the 
class of signals that are well approximated by a finite combination of the atoms of the basis (it 
contains 'Haarl', 'Haar2' and 'Comb' for the Haar basis and 'Gaussl' and 'Gauss2' for the spline 
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Figure 8: Average over 100 iterations of the function R n for signals decomposed on the spline basis 
and for n G {64, 256, 1024, 4096} with j = 10. 

basis). For such signals, the estimation problem is close to a parametric problem and in this case 
the performance of the oracle estimate can be achieved at least for n large enough and (16. lh is true 
for a wide range of 7 around 7 m i n that contains 7 = 1. The second class is the class of irregular 
signals with significant wavelet coefficients at high resolution levels (it contains all the other cases 
except 'Beta0.5'). For such signals, Table Q] shows that R n (l) seems to increase with n. But 

Rn S (l) remains constant, showing that the upper bound (with the logarithmic term) of Theorem 
[2] is probably achieved up to a constant. 'Beta0.5' has only one significant coefficient at each level. 
This may explain why its behavior seems to be between the first and second class behavior. Finally 
let us note that the oracle ratio curve for 'Bumps', jo = 10 and n = 4096 has a minimizer 7 m j n close 
to and has a different behavior from the one with jo = 12 (see Figure [6]). It illustrates again the 
fact that 'Bumps' has still some important coefficients at the level of resolution jo = 12 that can be 
taken into account if log 2 («) = 12. 

Finally, we would like to emphasize the following conclusions. Performances of our thresholding 
rule are suitable since the ratio i? n (l) is controlled. Moreover a convenient choice of the basis 
improves this ratio but also the performances of the estimator itself. Furthermore, the size of the 
support does not play any role (compare estimation of 'Comb' and 'Haarl' for instance) and the 
estimate f n> i performs well for recovering the size and location of peaks. 



7 Proofs 



In this section, the notation □ represents an absolute constant whose value may change at each line. 
For any x > 0, the notation \x] denotes the smallest integer larger than x. Notations of Sections [2] 
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Haar 


Spline 




n 


—(1) 


fln(l) 




—(1) 


Hn(l) 


«l? g (l) 




64 


0.016 


1.0 


0.2 


0.10 


1.4 


0.7 


Haarl 


1024 


0.0042 
n nnns 

U. UUUo 


1.1 


0.2 


0.068 


2.0 


0.8 




4096 


0.0002 


1.0 


0.2 


0.016 


3.5 


0.7 




64 


0.082 


2.6 


0.6 


0.21 


2.1 


1.0 


Haar2 


1024 


0.026 


3.3 


0.6 


0.085 


1.8 


0.7 




4096 


0.0004 


1.0 


0.1 


0.026 


2.9 


0.8 




64 


0.31 


1.4 


0.9 


0.27 


1.4 


0.9 


Blocks 


256 
1024 


0.26 
13 


2.5 

2 9 


1.0 

9 


0.21 
13 


1.9 

2 6 


1.0 

9 




4096 


0.053 


3.7 


0.8 


0.063 


3.2 


0.8 




64 


0.61 


1.7 


0.4 


1.71 


1.8 


0.8 


Comb 


1024 


0.12 


1.3 
1 4 


0.2 


0.78 


1.7 

2 7 


0.7 




4096 


0.0063 


1.1 


0.1 


0.23 


4.0 


0.7 




64 


0.21 


2.3 


0.9 


0.10 


2.1 


0.7 


Gaussl 


1024 


0.072 


1.8 


0.7 

7 


0.060 
n nnds 


4.5 


0.9 




4096 


0.018 


2.9 


0.7 


0.0017 


1.2 


0.2 




64 


0.17 


1.9 


0.7 


0.12 


2.1 


0.7 


Gauss2 


1024 


0.07 


2.0 


0.6 


0.05 


3.1 


0.6 




4096 


0.015 


3.0 


0.7 


0.0017 


1.2 


0.2 




64 


1.6 


1.7 


1.0 


2.2 


1.9 


1.0 


Beta0.5 


1024 


1.1 


3.4 


1.0 


1.4 


3.8 


1.0 




4096 


0.045 


1.6 


0.3 


0.066 


2.3 


0.3 




64 


0.25 


2.1 


0.8 


0.36 


2.2 


0.9 


Beta4 


256 
1024 


0.093 
0.041 


2.0 
2.2 


0.6 
0.6 


0.16 
0.061 


2.5 
2.7 


0.8 
0.7 




4096 


0.020 


2.8 


0.7 


0.024 


3.3 


0.6 




64 


4.9 


1.8 


1.0 


4.3 


2.0 


1.1 


Bumps 


256 
1024 


3.1 
1.5 


2.5 
3.0 


1.0 
0.9 


2.5 
1.2 


2.7 
3.4 


1.0 
0.9 




4096 


0.62 


3.4 


0.7 


0.38 


3.0 


0.6 



Table 1: Values of r n (l), R n {\) and i? r ° g (l) for each signal decomposed on the Haar basis or the 
spline basis and for n G {64, 256, 1024, 4096}. 
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and [3] are used. Recall also that we have set 

V A e A, F\ = f f(x)dx. 



7.1 Proof of Theorem CD 



Let 7,p, q, e be as in Theorem [Q We start as usual for model selection with (|1.3p . One has for all 
subset m of T n 

7n(/n, 7 ) + pen(m) < 7„(/ m ) + pen(m). 
If 5 = Zaga "a^a, setting z/ n (flO = J2xeA a X0X ~ #0, we obtain that 

ln(9) = \\9-f\\l-\\f\\l-2u n (g). 

Hence, 

1/rvY ~ /l| < ll/m - /111 + 2i/„(/ n>7 - / m ) + pen(m) - pen(m). 



For any subset of indices m' , let x( m ') = y X^Aem'(/^A — /?a) 2 and let f m = ^2\ em Px'-Px be the 
orthogonal projection of / on S m for ||.||<p. Then x 2 0™) = v n (fm ~ fm) = \\fm ~ fm\% = \\fm ~ /||| - 
j. Hence, 

\\fn,-y - / 111 < ll/m - /111 - X 2 ("i) + 2z^(/n, 7 ~ fm) + pen(m) - pen(m). 



\\f m - f\\l Hence 



Furthermore, 

fn(/n,7 - /m) < |/n, 7 - /mll^xC™ U m) < |/„ i7 - /||^x(m U m) + ||/ m - /||^x(m U m). 
Using twice the fact that 2ab < 6a 2 + 6»~ 1 6 2 , for (9 = 2/(2 + e) and (9 = 2/e, we obtain that 

2^n(/n >7 - /m) < ^-|/n, 7 " /ll| + -|/r» " /ll| + (1 + e) X \m U m). 
Hence we obtain that 

2^l/«>7 " /III ^( 1 +^)E^ + ( 1 + £ )^( m U ™) " * 2 ( m ) + P en ( m ) " P en (™)- 

^ ' A^m 

But x 2 ( m U m) < x 2 ( m ) + X 2 (" 1 )- After integration it remains to control 

A = E((l + e)x 2 {m) - pen(m)). 

Since 

rh = I A € T n : |/3 A | > ??a, 7 } , 

we have 



Aer n 

Hence, 



A= Y1 E([(l+e)0A- /?a) 2 - i£ 7 ] 1^|>^ 7 ) 



x< £ e((i + £ )(^-/3a) 2 i (1+£)( ^_ a)2 >^ 7 1 |4aI >, v 

Aer^ 
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Then, remark that if |/3\| > ??a,7 then \(3\\ > M ° g " |y?A ||oo, where ,u = [\/6 + l/3]7 but also that 
\P\\< M ™ Nx , hence iV A > fjHogn, where 

So, one can split .A and bound this term by LDLM + LDSM, where 



N x = dN. 

Isupp(tp x ) 



LDLM = E {i 1 + £ )0X -^) 2l (l+ £ )(/3 A -/3 A )2>r,2 7 1 |/3 A |>^ i7 1 ^A>^ognlnF A >e At logn / | , 

Aer n 

and 

LDSM = ^ E ((! + £)(A\ -/3A) 2 l (1+e){/ 3 A _ / g A) 2>^2^1| / 3 A |>^ 7 l7V A >^lognlnF A <e At logn) , 

Aer„ ' 7 

where 9 < 1 is a parameter that is chosen later on. Here, LDLM stands for "large deviation large 
mass" and LDSM stands for "large deviation small mass". Let us begin with LDLM. By the 
Holder Inequality 

LDLM < J2 (1 + £)PE|& - /?a| 2p )] 1/p P(|/?a - /9a I > W^T+^W^iogn- 
Aer„ 

Before going further, let us state the following useful lemma: 
Lemma 1. For any u > 

p(|/3a-/?a|> 72^+M^) < 2e -«. (7.1) 

Moreover, for any u > 

IP (Va,« > Va,„(u)) < e -«, (7.2) 

where 

■rr / \ w IIvaII™ . JIvAllii, 

-u. 



Vk,n(«) = V X , n + \2V Kn ^^U + 3 



n 2 n 2 



Proof. Equation (17. ip easily comes from the classical inequalities (see Kingman's book [26|] or 
Equation (5.2) of [3l|). The same classical inequalities applied to —ip 2 x /n 2 instead of (f\/n give that 



V x ,n > Vx,n + J 2u j R ^^nf(x)dx + u < e~ u . 



But one can remark that 



Set a = ulNJi, then 



nf(x)dx < ^y A , ; 
K n n 



P(^A,n " " a / 3 Z V\,n) < e~ U . 
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Let V(x) = x 2 — y/2ax — a/3. The discriminant of this polynomial is 10a/3 which is strictly 
larger than 2a. Since V\ >n and V\ <n are positive, this means that one can inverse the equation 
V(y / V\ tn ) = V\ >n and we obtain 

P( A A^>^ 1 (Va,„)) < e~ u . 
But V (V\ ;n ) is the positive solution of 

CP~HVx,n)) 2 ~ V^V- 1 ^) - (a/3 + V\ t n) = 0. 



So, finally, P" 1 ^) = 

(V~ l (Vx,n)?. 

Using Equations (|7.ip and (|7.2p of Lemma [H we have 



V\ jU + 5a/6 + a/2. To conclude it remains to remark that V\,n > 



n\Px-p\\ >??A, 7 /^rTi) 



< 



< 



+ 



27logn ~ 7logn (/? A oo 

Vx,„(7logn) + 



1 + e 



27logn~ 

v% n (7logn) + 



1 + e 



3(1 + e)n 

7 lo g n IIV ? A||oc 

3(1 + e)n 



Vx, n > V\^\ogn) 



27logn - 7logn y; A op ~ 

y A ,n(7logn) + on ■ — ,Vx,n < U Ain (7logn) 



1 + e 1 3(l+e)n 

< P(U A , n > y Ajn ( 7 logn)) +P(|/3 A -/3 A |> 



lognV Aj n + 



1 + e 



3(1 + e)n 



< n~"i + 2n-^ 1+£ ) 

< 3n -7/(i+«). 

We need another lemma which looks like the Rosenthal inequality. 

Lemma 2. For all p > 2, there exists some absolute constant C such that 

2p-2 \ 



n\h-Px\ 2p ) < 



C p P 2p ^ 



+ 



II PA || 



n 



Vx.n ■ 



Proof. We know that a Poisson process is infinitely divisible. This means that for all positive 
integer k one can see N as the reunion of k iid Poisson processes, N l with intensity (here) nk^ 1 x / 
with respect to the Lebesgue measure. Hence, one can apply Rosenthal inequalities for all k, saying 
that 



k k 

E / — ( dN i - nk- 1 f(x)dx) = J> 
i=i J n i=l 



where for any i, 



Yi 



(dNl - nk- 1 f(x)dx) 
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So the Yi's are iid centered variables, all having a moment of order 2p. We apply Rosenthal's 
inequality (see Theorem 2.5 of ji^]) on the positive and negative parts of YJ. This easily implies 
that 

k 



E 



2p\ 



< K(p) max (Uj2 Y A > ( E E l^! 2P ) ) 



where 



2p 



It remains to bound the upper limit of E(y^ =1 \Yi\ 9 ) f° r an 9 £ 2} > 2 when fc — > oo. Let us 
introduce 

n k = {iie{l,...,k},Ni<l}. 
Then, it is easy to see that P(^) < fe _1 (n||/||i) 2 (see e.g., (ESD below). 



On fi fc , = O fc (fc-«) if f^dN^ = and \Y^ 



\<Px(T)\ 



+ o k k- 1 



j £Mg) dJSfi — ^M-* - 1 where T is the point of the process N l . Consequently, 



\<Px(T)\ 



q-1 



if 



i=i 



+ O fc AT 1 



Iva(T)| 



+ W k (k' 



\ 



E 



Era 



vi=l 



(7.3) 



But, 



i=i 



vi=l 



n 



(J\£)« + AT 1 / l^^l/^Jdx 



< 2 9 



-i 



n 



N^ + k^k- 1 j \<p x (x)\f(x)dx 



So, when fc — ► +oo, the last term in (|7.3p converges to since a Poisson variable has moments of 
every order and 



lim sup E^l^l 9 <E 



k— >oo 



i=l 



IVA(ac)| 



I^aI 



which concludes the proof. 
Since 



n 



2p-2 



V\ n < max V? 



A,n> 



I^aI 



9-2 



A,n> 



there exists some constant C such that 

E(|/3 A -/3A| 2p )<C'V P K ri + 



n 



2pN 
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But, 



Finally, 

LDLM < □(l + e)p 2 n-^ 1+£ )) E L +(^) 

Since ||v^a||oo < c&nVn f° r a ll A G T n , one has 

LDLM < D(l + e)p 2 4 n n-^^( 1+£ )) E (V A + W A >^i og „ 

< □(! + £ > 2 4, n n-V(,(i +£ )) f £ Fa + 1 £ 

^' I ^— ' n t—' 9ulogn 

\Aer n Aer n r to 

E ^ = E /(^ 

supp(ip \) • 

(7.4) 

Aer n Aer n ^ ^ Aer„ 

Using (|2.ip . we then have 

E F * ^ l/llim^logn. 
Aer„ 

This is exactly what we need for the first part provided that 9 is an absolute constant and u > I. 
Now we go back to LDSM. Applying the Holder inequality again one obtains, 

LDSM < (1 + e) E E (l& - f3\\ 2p ) 1/p HN x - nF x > (1 - 9)u\ogn)^ q . 
Aer„ 

To deal with this term, we state the following result. 

Lemma 3. There exists an absolute constant < 9 < 1 such that if nF\ < OuXogn, then, for all n 
such that (1 — 6>)ulogn > 2, 

P(iV A - nF x > (1 - 9)fAogn) < F A n~ 7 . 



Proof. We use the same classical inequalities (see Kingman's book [26[] or equation (5.2) of [311]) . 

P(N X - nF x > (1 - e)uXogn) < exp ( tt 1 - 9 )t A °£ n ) 2 \ < „- ^s&M 

^ V 2(nF A + (l-0)/ilogn/3)y - 

If nF A > n~ 7 , then provided that 2(20+1) ^ ^ ^7 + 2, one has the result. This imposes the value 
of 9. Indeed since 

3(1 -9) 2 3(1 -9) 2 , r- 

WTTf = TO) ( ^ + 1/3)7 



one takes # such that 

3(1 -9) 2 
2(25 + 1) 

If < n-T- 1 , 



(76 + 1/3) =4. 



P(iV A - nF x > (1 - 0)[Aogn) < P(iV A > (1 - 9)/Aogn) < W(N X > 2) 

fc! 



^ E T^"^ < (^) 2 < F x n~\ (7.5) 



fc>2 
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We apply Lemma[3]to bound the deviation and Lemma[2]to bound E(|/3^ — f3\\ 2p ). Hence, 

2-2/p 



LDSM < D(l + e)p 2 n-^ q £ V x , n + 

Aer n V 



K A,n ^A • 



Since ||^a||oo < c^ny/n, 



LDSM < D(l + Ejp 2 ^/' ^ (F A 1+1/9 + F x ). 

Aer n 



Finally, as previously, by using (|T.4j> 

LDSM < D(l +e)p 2 4 )n m^ n n-^log(n)(||/|| 1 )max(||/|| 1 ,l) 1 /5. 

7.2 Proof of Theorem H 

At first, we apply Theorem Q] with c^ )rj = 1 1 1 1 oo 2- 70 / 2 tt, 1 / 2 . For the last term, we want to prove that 
one can always find q and e such that 2- 7o n _7// (' ? ( 1+e )) _1 log(n) = o(n _1 ). But if 7 > c then one can 
always find q > 1 and e > such that 7 > cq(l + e) and this implies also that 7 > 1 + e. So, by 
exchanging the infimum and the expectation we obtain that 

E(||/„, 7 -/|||)<(l + 2 e - 1 ) inf \{l + 2e- l )Y,(il+Y.^n + nvl,)] 

I A(£rn A£m 



| C 2 (7,ll/lli,c,c» 



But for all 5 > 0, 



E(^ i7 ) < (1 + £)2 7 lognE(VA, n ) + (1 + r 1 ) ( IMIL 



3re 
Moreover 

11 11 2 

E(VA.n) < (1 + 5)Va,„ + (1 + r 1 )3 7 logn^o . 

So, finally for all 5 > 0, 

E(||/„, 7 -/|||)<(l + 2e- 1 ) 

inf J (1 + £ $ + £ [ £ + (1 + 5) 2 2 7 logn]F A , n + c(«5, 7) £ f l0g " llyAi0 ° ) 



2 



A^m A£m A£m 



| C 2 ( 7 ,||/||i,cy,<p) 
n 



where c(<5, 7) is a positive constant. One needs the following lemma. 
Lemma 4. We set 

Stp = max{ sup |<^(^)|, sup |^(a;)|} 

x&supp(4>) x(impp(v) 
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and 



Itp = min{ inf |</>(x)|, inf |^(x)|} 

x£supp(<f>) x£supp(tp) 



Using \3. fy) . we define @ v = jt- We have, for all A € A, 



ifF x <Q^,thenPl<eyi^ 



2 <r Q2 Jl log(n) 



- ifF x >G v ^, then 



lQ g(") <- . /MM 

OO r, — CT A 1 



Proof. We note A = (j, k) and assume that j > (arguments are similar for j = — 1). 

i fi 7 A < i£iW )Wehave 



log(ra) 



n 



log(n) 



n 



log(n) 



7? 



since 



For the second point, observe that 



aj 1 ^ > 2^%^°^ 



n 



n 



and 



'Aoc 



lo gQ) < 2 j/2 s iogQ) 



11 



n 



Now let us apply (|7.6|) for some fixed 5, e to 



m = < A G T r , 



$ > ejAlogn 

^ n 
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This implies that for all A G m, F\ > &ip l ° g ^ . So, since 0^ > 1, 



E(|/ nj7 -/|||)<C( 7 )x 
Aer„ ^ su ^ » logn A^r„ Aer n 



logn 2 / logn 
cr A + 



n 



n 



1 2 



C 2 ( 7 , ||/||i, c,c» 



<C(7) 



E fel^<6» V Ain logn + 21ognF Ajn l^ >e a V A ,„logn) + E # 



Aer n 



A^r„ 



+ 



C 2 ( 7 , H/Hi, c,c» 



< Ca( 7 ) 



£ min(/3|,8^ A)n logn)+ £ /?. 
Aer„ A^r n 



+ 



C 2 ( 7 ,||/||i, c,J,<p) 



n 



where C( 7 ) and Ci( 7 ) are positive quantities depending only on 7. 



7.3 Proof of Theorem H 



Let us assume that / belongs to B^ a (R 1+2- ) n W a (#) n Li(i?) n h 2 (R). Inequality (jUJ) of 
Theorem [2] implies for all n, 



E(|/ n , 7 -/|||)<C 1 ( 7 ^) 



V ( 3\\ n — + F An lognl 



Er) + E £ 

1 / \ rfr 



+ 



+ 



C 2 (j,R, c,d,(p) 



But 



+00 



-<2- fe /3? 



Aer„ 



Aer„ 
+00 



fc=0 



fc=o AeA 
+00 2 / 

< ^2- k R^ ( 2 (AH 



|/9 A |<2(*+i)/a^J^ 



l)/2 



fe=0 



logn 



l + 2a 



r? 



fc=0 



and 



E ^A < ^^Pn, a - 

A^r„ 
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So, 



E(||/ n , 7 - /|||) < C( 7 , <p, a)R^p'i a + 
where C(j, ip, a) depends on 7, the basis and a. Hence, 



2 , C 2 ('j,R,c,(/,(p) 



n 



E( 



'n,7 j II <p 



f\\l) < C( 7 , ajfl^p* (1 + o n (l)) 



and / belongs to MS(f y , p a )(R') for R' large enough. 

Conversely, let us suppose that / belongs to M5(/ 7 , p a )(R') n Li(i2') n L^i?'). Then, for any 

n, 

n .9 / loETT. \ 1 + 2a 

Consequently, for any n, 



n 



A^r„ 



y 2 AognN 1+2a 



n 



This implies that / belongs to B^ v 2a (R'). 

Now, we want to prove that / £ W" a (i?) for i? > 0. We have 



AeA 

But ft =4l|4 A |>^ , so, 



Aer„ 



W m <?>xi < I/?a-/?a|- 



So, for any n 

E^ 



AeA 



Agr„ 



Aer„ 



A^r„ 



Aer„ 



Aer„ 



^ E ^ + E E ^ - /3a) 2 ] + E $ p U 

Agr n Aer n Aer n V 



7l°g n > ^Vy 
2ra 2 



< E(|/ n>7 - /|||) + E 0aW"a 
Aer n V 



2n 2 



Using Lemma [H 



and 



> ?7A, 7 I < P(Vx,n < ^A,n) < n"T 



E/^ 



AeA 



I/3a|<*ai 



/ 7logn 
2n 



< (#) 



i\2 



logn 



■;?■ 



+ ll/ll|n- 7 . 
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Since this is true for every n, we have for any t < 1, 



4cn 

/ 2 \ l+2a 



where R is a constant large enough depending on R' . Note that 



AeA 



We conclude that 



feB"+ 2a (R)nw a (R) 



'2,r 

for R large enough. 

7.4 Proof of Proposition [1] 

Since (3 < 1/2, fp G Li n L2. If the Haar basis is considered, the wavelet coefficients f3j i k of fp can 
be calculated and we obtain for any j > 0, for any k g" {0, . . . , 2 J — lj , ftj^ = and for any j > 0, 
for any k e {0, ...,2* - l}, 

1-/3 \ 
1-/3 _ (jfe 



and there exists a constant < ci^ < 00 only depending on f3 such that 

lim 2 J '(5-% 1 +/ 3 /3 jifc = Cls . 



k—>oo 



Moreover the /%fc's are strictly positive. Consequently they can be bounded up and below, up to a 
constant, by 2~ j ^~ f3 ^k~ ( - 1+l3 \ Similarly, for any j > 0, for any k G {0, . . . , 2 J ; — l} , 

al k = (l-(3r^{(k + l)^-k^). 
and there exists a constant < 02,^ < 00 only depending on (3 such that 

lim 2-^k^l k = c 2)/3 . 

There exist two constants n{(3) and n'{(3) only depending on j3 such that for any < t < 1, 



|^,fc| < ta j>k ^k> n(j3)t ^2 3 \^ 
and 

KXfi)t~T&*tf\W > 2 j 2 j < «'(/3)t"S. 
So, if 2* < «/(/3)t~l, since /3 jfc = for k > V, 

fcez 
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We obtain 

+00 2^-1 

£ <w> E s^ 1 ' 2 "' W m h E "-^ £ cw* 1 * 4 . 

AeA j=-i fc=i 

where C(/3) and C'(f$) denote two constants only depending on f3. So, for any < a < 7, if we take 
/? < 2+4a , then, for any < i < 1, t 3 < £1+20 . Finally, there exists c > 1, such that for any n, 

where R > 0. And in this case, 

//3 Loo, U e Bgf 55 n W Q := M5(/ T , Pa ). 

7.5 Proof of Theorem H] 

Since 

V A = (j, fe), ^ < min [max(2^; ! II/IUM!!] , (7.8) 

where </? 6 ip} according to the value of j, we have for any t > and any J > 

2-p 



j<j k j>j k 



00HI 



2-p 



1/%.* 



< c(^, J R')(2 j t 2 +t 2 - p j:j:i/3 J , 



where c(<p,R') is a constant only depending on the basis and on R' . Now, let us assume that / 
belongs to £>p )00 (i?) (that contains Bp q (R), see Section [3]), with a + | — - > 0. Then, 

A 



where c\((p,a,p,R') depends on the basis, a, p and i?'. With J such that 

2 J < RTT^t^ < 2 J+1 , 
J^^M^xt - C2(v,a,p,R')R^tTTt 

A 

where C2((p,a,p,R') depends on the basis, a, p and R' . So, / belongs to W a (R") for R" large 
enough. 

Furthermore, using (|3.3|) . if p < 2 and 

1 \ 1 1 

a 1 -, r > 

c(l + 2a) / ~ p 2 
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Z?^(i?)c£™(i?). 

Finally, for R" large enough, 



BZ q (R) C Bp jOQ (R) C B^ 2a) (R") n W a (R"). 

We recall 

MSCf llPa ) :=B^^nW a , 

which proves t|4. 3[) . 
Moreover 

inf sup E(||/-/|| 2 ) > C{a,R,R')n~^+i, 

where C(a, R, R') is a constant. Indeed, using computations similar to those of Theorem 2 of fl7l |. 
it is easy to prove that if K is a compact interval and Bp k(R) iS * ne se ^ °f functions supported by 
K and belonging to Bp q (R) the minimax risk associated with Bp qK (R) is larger than ra - 2Q /( 1 + 2a ) 
up to a constant. 

But (|4.4I) implies that a > a* and p > p* satisfy (|4.2p . This proves the adaptive minimax properties 
of / 7 stated in the theorem. 

7.6 Proof of Theorem [5] 

The proof is established for p < oo. Similar arguments lead to the same results for p = oo. Let us 
fix real numbers re* > 1 and /* > 1 and let us define the following increasing sequence 

a = 0, oi = 4 and V I > 1, a i+ i = 2a ; + 2f n * z l +1 . 

Let bi = ^ - 1. Let J+ fe = [jfe2~^, (A; + 1/2)2^'] and J7 fe = [(Jfe + 1/2)2^', (fc + 1)2^']. Set for all 

fl(x) = £ 2^M l +\ + 

' ' l,m 

m=ai 

and 

+ OO 

1=0 

The //'s have support in Si = [ai2~ l , a; + i2~(' +1 ) [. All the Si's are disjoint and we can prove by 
an easy induction that all the a;2~''s are even positive integer numbers (indeed, a; + i2~(' +1 ) = 
2[WH + 0{ 2- 1 and \nj] - Z > if 1^0). 

Now, let us compute the wavelet coefficients associated with / denoted /3j t k for j > and for 
any k £ Z and = & for any k £ Z. We are working with the Haar basis. Recall that the 
spaces considered are viewed as sequence spaces. 

For the /3j,fc's, let us remark that supp{(pj : k) is always included between two successive integers, 
consequently there exists a unique such that supp(ifj t k) C Si. k . So, 



Pj,k - / fij, k fj,k- 
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Moreover, if j / Ij^, the coefficient is zero: either j > and (fj t k sees only one flat line, or j < lj )k 
and (fj t k integrates the same number of flat pieces in lf k and IJ k ; since the pieces have all the 
same level, this is also 0. Finally, for j = the computation is easy and we find 

jtk = 2^f^ +1 [ 2^ 2 [1 I+ - l Ir ] x l a .< fc < 6 . = 2-^*- 1 / 2 h aj < k < bj . 

JJ+ 3, k i, k 

For the coefficients a^'s, there exists also a unique l k such that supp(ip-i^) C Si k and 

„, _ 2 (i-/.)( t +il_ V2< w «' i l 

a k — z 2 ~~ i a ; 2- i <fc<a i+1 2-< ! + 1 )- 

Now, we want to compute aj tk when Pj^ 7^ 0. If j > 

F iifc = / f(x)dx = yV-f'h-i = 2-1 

J supp(ip J%k ) 

\k = I i>l k (x)f(x)dx = 2> [ f(x)dx = VF hk = 2^-^. 



4 



if j = -i 



I supp(ip jik ) 



Now, we fix the parameter n* and /* such that 

1. l/li < oo, ||/|| 2 < oo, l/IU < oo, 

2 f G E a 
z - J t u p,coi 

3. / i W a . 
Since /* > 1, then ||/||oo < °o. We have 



+oo bi +oo 

|i = E 2( 1 -/*)'+ 1 2-'- 1 =^2MrW < oo ^ /* > n,. (7.9) 

1=0 m=ai 1=0 



We have for all j > 



Then, 



= ^| 2 -^-v2)|p 

k k — a j 

= 2^ n *^2^ p / 2 2~~^* p . 



f e B% iO0 3i? > 0, Vj > 0, 2 j ( n * +p / 2 - f *ri < jR P 2 --'' p ( a+1 / 2 - 1 / p ) 

-^=>- n* + p/2 - /*p < -pa - p/2 + 1 

n* < pf* - V + 1 - P<*- (7-10) 
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Indeed, note that we have 



fcez i>o 



< oo 



if and only if n* — 1 + p — f*p < 0, which is true as soon as /* > n*. Note also that 

II y II 2 < oo 2/* > 1 + n*, 

which is also true as soon as /* > n*. 

Now, we would like to build / such that / does not belong to W a . We have for any t < 1, 



E^ 1 



l/3j,fcl<*o r J -,fc 



^ 2 -2j(/»-l/2) 1 



2-J(/*-i/2)<j2J(i-/*)/2 



2 j(i-2/«)2Kili ., , 



so, with j = ri og2 (r 2 //*)i 



sup f 

t<l 



-4a/(l+2a) 



sup f-4«/(l+2a) r 2(l+n*-2/*)//, = +oo 



j k=aj 



t<l 



2(1 + n*- 2/*)//* <4a/(l + 2a) 



2/* — n* — 1 < 
n* > — 1 + 



l + 2a 
2/,(l + a) 



l + 2a 



(7.11) 



and in this case, / ^ W a . Now, we choose n* > 1 and /* > 1 such that (|7.9p . (|7.10|) and (|7.1ip are 
satisfied. For this purpose, we take 



/* = 1 + 2a - 5 e 



{l + 2a) J Pa + p - 2) „ l + 2a 



2pa + p - 2a - 2 
for 5 e]0, a[ and 5 small enough. Note that p > 2 implies 

(l + 2a) - 2 > „ <l + 2a. 



2pa + p - 2a - 2 



We also take 



n* = min(/* - £',p/* -p + 1 - pa) G]l,p/* -p + 1 - pa] 
for 6' small enough. Note that 

Pf* ~ P + 1 — pa = p(l + 2a — 5) — p + 1 — pa = pa + 1 — p5 > 1. 

With such a choice, we have ra* < /* and re* < p/* — p + 1 — pa. So (|7.9p and (|7.10p are satisfied. 
It remains to check (17. lip . We have 



Pf* — p+l-pa>— 1 + 



2/«(l + a) 
l + 2a 



/* 



2(1 + a) 



P 



< 2 — p — pa 



l + 2a 

/*(2 + 2a-p-2pa) < (1 + 2a) (2 - p - pa) 
(pa + p - 2) 



/*>(! + 2a) 



2pa + p - 2a - 2 ' 
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and 



/* - 5' > -1 + 



2/»(l + a) 
l + 2a 



/* 



2(1 + a) 



1 



< 1-8' 



l + 2a 

2(1 + a)/* - /*(1 + 2a) < (1 + 2a)(l - 5') 
/* < (l + 2a)(l-<S / ), 



which is true for 8' small enough. So (|7.11|) is satisfied, which concludes the proof of the theorem. 
7.7 Proof of Theorem ffl 

The proof is established for q = oo and p < oo. Similar arguments lead to the same results for 
p = oo. In the sequel, C designates a constant depending on R' , 7, c, c', on the parameters of the 
Besov ball, on the basis and that may change at each line. We have for any < t < 1 and any 

3 > 0, 



(7.12) 



with i + i = 1. So, using JESJ, we have if / € W^') n Li(iJ') n Bp j00 (R), 



< C2 



Indeed, 

/ € Li(#) £ < C2l 
(see p. 197). So, for a > l/(2p), we have for any t > and any J > 

E^ 1 fe |/%,fc|<0j,fct 

A j fc 



c 



< c 



* 2 E 2i E^ + E 2 



t 2 2 J + 2- 7 >-^)t 1 - 



(7.13) 



using (|7.8p again 



With 
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we have 



We obtain 



So, with t 



Furthermore, 



£$1 



I/3aI<«-a* 



logyt 
n ' 



^ \M<«x^ ~ \ n J 



logn\ c+i-i 



+00 



'-<2- k /3? 



Aer„ 



Aer„ 
+00 



fc=0 



< E 2 "*E^ 

fc=o AeA 
+00 / 

< C^2- k 2^ 



\p x \<zv>+w ax J&F. 



_ 1)/2 /bi^v +1 "^ 



k=0 



n 



< C 



< c 



1 \ SL T- +°° /, 1 a<j!+V 



71 



? E 2 



2? 



fc=0 



Now, using 1(712)1 . ((7131) and (ET5J) we have when A = (j, fc) T r 



E#* ^ C2 



-j a+ 



2 P 



< C2 j V Q+ 2 Py 



Ei^Ksupi^ir 1 

v k 

Ei^i 2- 



j(r-l) 



and applying Theorem El we obtain for c > 1, 
and 

Edi/^-ziiDsc^)^. 
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7.8 Proof of Theorem ffl 

Let us consider the Haar basis. For j > and D G {0, 1, . . . , 2 3 }, we set 

Cj,D = {f m = pl[o,i] + ^ ¥j,k ■ \m\ = D,m C A/}}, 

fc£m 

where 

Mj = {k : tpjfi has support in [0, 1]}. 
The parameters j, D, p, dj^D is chosen later to fulfill some requirements. Note that 

Nj = card(A/}) = 2 j . 

We know that there exists a subset of Cj,D, denoted -Mj,D, and some universal constants, denoted 
9' and cr, such that for all m,m' G Mj,D, 

card(raAm') > 6>'L>, log(card(.M.,- D )) > crDlog \^\ 

(see Lemma 8 of [HI]). Now, let us describe all the requirements necessary to obtain the lower 
bound of the risk. 

• To ensure f m >0 and the equivalence between the Kullback distance and the L2-norm (see 
below), the / m 's have to be larger than p/2. Since the ^fc's have disjoint support, this means 
that 

p>2 1+ ^ 2 | aj . D |. (7.14) 

• We need the / m 's to be in Li(JZ") n h^R"). Since ||/||i = p and \\f\oo = p + 2^ 2 \a jtD \, we 
need 

p + 2 j/2 \a j)D \ < R". (7.15) 



The / m 's have to belong to B^ a (R 1 ) i.e. 

p + 2 ja /( 1+2a )^D|a iiD | < R'. (7.16) 

The / m 's have to belong to W a (R). We have a\ = p. Hence for any t > 

r?-\ _L Tin 2 1 <T f? 2 /(l+2a) .4q/(1+2q) 

If |aj,D| < P, then it is enough to have 

p 2 + DaJ )D < jR 2/(l + 2a) p 2a/(l+2 Q ) (? 1?) 



and 

/ 2 \ 2q/(1+2q) 

£>4 D < R 2 ^ l+2 ^ [ ^ ) . (7.18) 



P 
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If the parameters satisfy these equations, then 



K(W a (R) n {R!) n £i j2 ,ooOR")) > K(M j>D ). 
Moreover if for any estimator /, we define /' = arginf^g^^. D \g — /||^, then for / £ Mj t D, 

If ~ f% < If ~ fb + 1/ " fU < 2||/ - fh- 

Hence, 



n(M jtD )>- „inf sup E(||/-/||| 
4 ./•: -\1 .,, / • M j n 



But for every m ^ m', ||/ m - / m /||| = EfcemAm' a j,D ^ e ' Da ),D- Hence, 

ftCM.VD) > . inf (1- inf P(/ = /)). 

We now use Fano's Lemma of and to do so we need to provide an upper bound of the Kullback- 
Leibler distance between two points of M-j,D- But for every m ^ m' , 

K (P/^P/J = n / / m , (exp flog^) - log^ - l) 

JR V V /to' / /to' / 

= n f ( f m - f m > - / m /log ( 1 + "' 



/m' 

, /" (/m — /to') 

< n / 

JR Jrn 

— n ||/m — /m' II2 
P 

2 2 

< -nDa j D , 

p J > 

since log(l + x) > x/(l + x). So finally, following similar arguments to those used by [IH (pages 
148 and 149), Fano's lemma implies that there exists an absolute constant c < 1 such that 

TZ(M hD ) > { l^-Da\ D 
as soon as the mean Kullback Leibler distance is small enough, which is implied by 

-nDa 2 jD <caD\og(2 j ID). (7.19) 
p J ' 

Let us take j such that 2 J < ra/logn < 2 J+1 and with D < 2 J , 

4d = ^o S (V/D). 

First note that (17. 19p is automatically fulfilled as soon as p < 2ca, that is true if p an absolute 
constant small enough. Then 



*VX D |<P + W^<1.5,. 
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So, if p is an absolute constant small enough, (|7.15l) is satisfied. Moreover 



2^\a jD \ < 2 1 W^ < p. 

V An 

This gives (|7.14|) . Now, take an integer D = D n such that 

l/(l+2a) 



D ~ i?2/(l+2a) 



)1 



s logn / 

For n large enough, L> n < 2 3 and _D n is feasible. We have for R fixed, 



2 2 lQ g n 



n 

where c a is a constant only depending on a. Therefore, 

p + 2 ja/(l+2a)^| a . jl? j =p+y fe pR Va+**) + Gn(1) 

bmce 

i? i/(i+2a) < ^/ it ig s U fii c i ent t0 ta ke p small enough but constant depending only on a to 
obtain (|7.16p . Moreover, 



D a 2 n ~ c ^i? 2 /^ 2 ") 



2a/(l+2a) 



Hence (T7T7D is equivalent to p 2 < j R2/(i+2a) p 2a/(i+2a)_ 

Since i? > 1, this is true as soon as p < 1. 

Finally (|7.18p is equivalent, when n tends to +oo, to 

c a p 2 <( CaP r/^. 

Once again this is true for p small enough depending on a. As we can choose p not depending on 
R,R',R", this concludes the proof. 

Corollary [2] is completely straight forward once we notice that if R' > R then for every a, R' > 
i 

it! i+2<* . 

7.9 Proof of Theorem M 

Let a > 1 and n be fixed. We set j a positive integer such that 

n 2n 
< 2° < 



(logn) Q ~ (logn)"' 

For all k G {0, ...,2 J — 1}, we define 

/■(ft+l/2)2-J /.(fc+i)2-J 
iV+ fe = / dN, N7 = dN. 

All these variables are iid random Poisson variables of parameter p n j = n2~ 3 ~ x . Moreover, 

hk = ^ N tk- N I,k) and V m , n = ^(N+ k + N7 k ). 
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Hence, 



2^-1 



2-' 



E(||/n, 7 - /|||) > £^f(iV+ - AT fc ) 2 l 



fc=0 



|JV+ fc -N-J>^2 7 log(n)(JV J + +AT- fc )+log(n) U „ y / ' 



Denote by v nj - = 4~flog(n) fi n j +log(n)n„) 2 . Remark that if N^ k = fi n j + and iV- fc 



2 ' 



l^tfc " ^fcl = ^2 7 log(n)(iV+ fe + ATfc) + log(n)« n . 
Let ./V + and iV _ be two independent Poisson variables of parameter fi n j. Then, 



2 2i 



E(|/„ )7 - /|||) > ^v nJ ¥ ( iV+ = ^ + ^ and iV~ = ^ - 



Note that 



and 



J n,3 



Jim ^Z = . 

n->+oo // nj - 

So, Z nj j = /j, n j + - ^ and m n j = fi n j — ^ v ™' 3 go to +oo with n. Hence by Stirling formula, 



E(||/„ )7 -/|||) > 



(logn) 



2a 



= Mnj + ^ P AT- = /i. 



u II 

(logn) 2a Z nJ ! m nJ ! 



> 



> 



4 ^.J f thhl\ ^ r-^i-lnj) ( I^hl\ mn ' 3 r -(^-m^) (l + On(l)) 



(logn) 2 "- 1 V k 



27 -Mn.j 

-e 



->/<„ 



2 M „ 



^(logn) 20 " 1 

where h(x) = (1 + x)log(l + x) - x = x 2 /2 + (9(x 3 ). So, 



(l + On(l)) 



Since 



we obtain 



3/2 

n,j 1 q I 71, j 



v n ,j = 4ry\og(n)fi n j(l + o„(l)), 
2 7 



(l+On(l))- 



^(ll/. 7 -/ll|)>- (logn)2a 



_ e -7log(n)+ „(log(n))^ + 



Finally, for every e > 0, 



1 



E(||/n, 7 -/|||)>^(l + O n (l)). 
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7.10 Proof of Theorem M 

We use notations of Lemma HI Let / G T n . We apply (|7.6p with e = 1.4. Then, with 7 = 1 + y/2, 
and <5 > such that (1 + 5) 2 = 11.8/(27 x (1 + 2/e)) ~ 1.006, JESJ becomes 

E(||/ nj7 -/|||)< 

11 Agm AGm AGm x 



C 2 (i,\\f\\i,c,c>,<p) 



n 

Now, take 

m = {A G T n : $ > F A , n }. 
If m is empty, then /3| = min(/3^, V\ iri ) for every A of F n . Hence 

C2(7,[|/[|i, c,d,<p) 



E(|/n, 7 -/|$)< 6 + 



A 6 r„ 

The result is true for n large enough even if the /3 A 's are all zero and this explains the presence of 
1/n in the oracle ratio. 

If m is not empty, note A = (j, fc). Since F\ < 2~- ? |/| 00 , if Fa 7^ 0, then 2- ? = 0(n/logre) and A G T n . 
Since 

|/5a| < ^2^ 2 F A , 

this implies that F A is non zero for all A G m, and that if j3\ / then A G r n . Now, 

n n v nB^ 

Hence, for all n, if A G m, 

rr 1 ^ (logn) 2 (loglogn) 2 

y A , n iogn > —— 2 HvaIIoo 

and if n is large enough, 

0.21ogn J] F A , n > c(ff, 7 )(l + 26- 1 ) £ (^pY \<Px\lo + 3-4 £ F A , n . 

Agm Asm Agm 

7.11 Proof of Theorem fTDI 

Before proving Theorem [TUJ let us state the following result. 

Proposition 2. Let 7 m i n G (1,7) &e /w;ee? anc? let T?A,7 min fre i/ie threshold associated with 7 m in- 

/I ; . 7minlogn . 

where 



11 i|2 11 ||2 

Vx, n = V\,n + \l 27 min lognV A n — ^ + 3 7min l gn — ^ 
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(see Theorem^. Let u = (u n ) n be some sequence of positive numbers and 
A u = {A such that P(r? A , 7 >\P\\ + V\,<y min ) > l ~ u n}- 

Then 

E(||jW, 7 - /|||) > ( E ft) (!" (3n^ + Un))- 
\AeA u / 

Proof. 

aga„ 

E ^P(I/3a| < Vx, 7 ) 



> 



> 



AeA„ 

E ^P(I^A - /9a I + I/3a| < 7?A, 7 ) 
AeA u 

> E $ P (I& - /9a I < Vx, lmin and r? Ai7min + |/? A | < r/ Aj7 ) 



aga„ 



^ E ft f 1 " ( P (l& - Px\ > r?A )7mi J + P(^?A, 7min + |/9a| > r/A )7 ) 



AeA u 



> E ft (l-(3n-^ in + «.)), 
\AeA u / 

by applying Lemma [H 

Using this proposition, we give the proof of Theorem [TQl Let us consider 



/ 2(^/7- V7^) 2 logn 
J = Ho,i] + Z> V n 

with 

A/} = {0,1,. ..,2^-1} 

and 

n „• 2n 
-r— < 2 3 < -— , a > 0. 

(logn) 1+ct ~ (logn) 1+a 

Note that for any (J,k), if Fjk ^ 0, then Fj ^ = 2~- ? > ( lo g n )(|°g lQ g w ) f or n i ar g e enough and / 
belongs to T n . Furthermore, V7_i o),n = \ and for any k G Afj, V(j t k),n = ^- So, for n large enough, 

E min(^,F A)n ) = V ( _ li0 ), n + E = ^ + E \- 

Now, to apply Proposition [2j let us set for any n, u n = re~ 7 and observe that for any e > 0, 
W(vx, lmia + \Px\ > *?A, 7 ) < P((l + e)2 7 minlogny x ,n(7min) + (1 +e~ 1 )/3f > 2 7 logny Ai „(7)), 



since 7 min < 7. With e = vt/ow^ ~ 1 and 9 = a/tw/t, 
P((l + e)27 m inlognV Ain ( 7m m) + (1 + e -1 ^ > 2 7 logny Ai „( 7 )) 



P(^U A , n (7mi„) + (1 - 0)F A) n > VA, n ( 7 )). 
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Since V\,n(7min) < Vx t n(l/), 

So, 
and 



'A, 7min + > VX a ) < HVx,n > VxM) < U n . 

{(j,k): k £ Nj} C A u , 



1(1/^-/11) > ^^(l-(3n-^-+n-T)) 



keAfj 

> (V7 - V7^) 2 21ogn ]T min(^, F A , n ) — — ) (1 — (3rT^ + n"?)). 

\Aer n n / 



Finally, since card (A/}) — ► +oo when n — > +oo, 



E (ll/n, 7 - /111) 



T > (V7 - V^w) 21ogn(l + o n (l)). 



Appendix 

The following table gives the definition of the signals used in Section [6) 



Haarl 



L [o,i] 



Haar2 



1-5 1(0,0.125] +0.5 1[0. 125, 0.25] + 1 [0.25,1] 



Blocks 



1 lo,i] 
3.551 



32 

^ ho* 



Comb 



^ 1 [fc 2 /32,(fc 2 + fe)/32] 



Gauss 1 



1 /(z-0.5) 2 
: exp ' 



0.25V^ V 2 X 0.25 2 



Gauss2 



1 /(a;-0.5) 2 
l^ eXP V 2 x 0.25 2 



3 / (x - 5) 2 



2 x 0.25 2 



Beta0.5 



O.Sx- 05 !^,!, 



Beta4 



3z 4 lh H 



Bumps 



284 



where 

P = 
h = 



[ 0.1 0.13 0.15 0.23 0.25 0.4 0.44 0.65 0.76 0.78 0.81 

| 4 -5 3 -4 5 -4.2 2.1 4.3 -3.1 2.1 -4.2 

[4 5 3 4 5 4.2 2.1 4.3 3.1 5.1 4.2 

I 0.005 0.005 0.006 0.01 0.01 0.03 0.01 0.01 0.005 0.008 0.005 
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